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Casella and Robert (1996) presented a general Rao-Blackwel- 
lisation principle for accept-reject and Metropolis-Hastings schemes 
that leads to significant decreases in the variance of the resulting esti- 
C*j mators, but at a high cost in computing and storage. Adopting a com- 

^ pletely different perspective, we introduce instead a universal scheme 

5 that guarantees variance reductions in all Metropolis-Hastings based 

y—{ estimators while keeping the computing cost under control. We es- 

tablish a central limit theorems for the improved estimators and il- 
^_ ' lustrate their performances on toy examples and on a probit model 

L / estimation. 

u 

1. Introduction. As its accept-reject predecessor, the Metropolis-Has- 
tings simulation algorithm relies in part on the generation of uniform vari- 
ables to achieve given acceptance probabilities. More precisely, given a target 

*0 density / wrt to a dominating measure on the space X, if the Metropo- 

lis-Hastings proposal is associated with the density q(x\y) (wrt the same 

■^j- dominating measure), the acceptance probability of the corresponding Me- 

tropolis-Hastings iteration at time t is 

S q(^ = min/l^ ^ ( M 

q k 

J> when yt ~ q(yt\x^ t ' > ) is the proposed value for x^ t+1 \ In practice, this means 

that a uniform ut ~ 14(0, 1) is first generated and that x^ t+1 ^ = yt if, and 
only if, u t <a(x {t \y t ). 

Since the uniformity of the u^s is an extraneous (albeit necessary) noise, 
in that it does not directly bring information upon the target / (but only 
through its acceptance rate), Casella and Robert (1996) took advantage of 
this flow of auxiliary variables ut to reduce the variance of the resulting 
estimators while preserving their unbiasedness by integrating out the u^s 
conditional on all simulated y^s. This strategy has a non- negligible cost of 
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0(iV 2 ) for a given sample of size N. While extensions have be proposed in 
the literature (Casella and Robert, 1998, Perron, 1999, see also Delmas and 
Jourdain (2009) for an analysis of a Rao-Blackwellized version of the estima- 
tor when conditioning on the rejected candidates), this solution is therefore 
not considered in practice, in part due to this very cost. The current pa- 
per reproduces the Rao-Blackwellisation argument of Casella and Robert 
(1996) by an independent representation that allows to reduce the variance 
at a fixed computing cost. Section 2 exposes the Rao-Blackwellisation tech- 
nique and Section 3 validates the resulting variance reduction, including 
a derivation of the asymptotic variance of the improved estimators, while 
Section 4 presents some illustrations of the improvement on toy examples. 

2. The Rao Blackwellisation solution. When considering the out- 
come of a Metropolis-Hastings experiment, (x^)t, and the way it is used in 
Monte Carlo approximations, 

1 N 

(1) <5=-J>(* W ), 

t=i 

alternative representations of this estimator are 

j JV ( M 

6= nY,Y1 h (yj) J xw= yj and 6 = x Yl nih ^ ' 

t=l j=l i=l 

where the yj's are the proposed Metropolis-Hastings moves, the jj's are the 
accepted yj's, M is the number of accepted yfs till time N, and rij is the 
number of times fa appears in the sequence (x^)t- The first representation 
is the one used by Casella and Robert (1996), who integrate out the random 
elements of the outer sum given the sequence of yt's. The second represen- 
tation is also found in Gasemyr (2002), Sahu and Zhigljavsky (1998, 2003), 
and Malefaki and Iliopoulos (2008), and is the basis for our construction. 

Let us first recall the basic properties of the pairs (jijttj), also found in 
the above references: 

Lemma 1. The sequence (3i,ttj) satisfies 

1. (3j,rij)j is a Markov chain; 

2. and n» are independent given fa; 

3. rij is distributed as a geometric random variable with probability pa- 
rameter 

(2) p(h) ■= / a{i i} y)q{y\ii)dy- 
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4- (h)i is a Markov chain with transition kernel Q($,dy) = q{y\l)dy and 
stationary distribution n such that 

q(-\i) oc a(i, -)q(-\i) and vr(-) oc ir(-)p(-) . 

Proof. We only prove the last point of the lemma. The transition ker- 
nel density q of the Markov chain is obtained by integrating out the 
geometric waiting time, namely q{-\}i) = a(fa, •) q(-\u) / ' p(fa) . Thus, 

-/ w i n tt(x)p(x) a(x,y)q{y\x) 

■K{x)q(y\x) = — = ir{y)q(x\y) , 

J ir(u)p(u)du p(x) 

where we have used the detailed balance property of the original Metropo- 
lis-Hastings algorithm, namely that ir(x)q(y\x)a(x, y) = n(y)q(x\y)a(y, x). 
This shows that the chain (3^)^ satisfies a detailed balance property with 
respect to n, thus that it is 7f-reversible, which concludes the proof. □ 

Since the Metropolis-Hastings estimator 5 only involves the jj's, i.e. the 
accepted j/t's, an optimal weight for those random variables is the importance 
weight l/p(3i), leading to the corresponding importance sampling estimator 

a* = — v 

but this quantity is usually unavailable in closed form and needs to be es- 
timated by an unbiased estimator. The geometric rij is the obvious solution 
that is used in the original Metropolis-Hastings estimate, but solutions with 
smaller variance also are available, as shown by the following results: 

Lemma 2. If (yj)j is an iid sequence with distribution q(y\fo), the quan- 
tity 

00 

3=1 e<j 

is an unbiased estimator ofl/p^fa) which variance, conditional on%i, is lower 
than the conditional variance ofxii, {1 — p{%i)} / p 2 (li) ■ 

Proof. Since m can be written as 

00 

= l + > a(fo,ye)} , 
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where the Uj's are iid U(0,1), given that the sum actually stops with the 
first pair (uj,yj) such that Uj < a($i,yj), a Rao-Blackwellised version of rij 
consists in its expectation conditional on the sequence (yj)j- 



3=1 



(y. 



t t>i 



j=i i<j 

oo 

= 1 + EII{ 1 - a ^)} • 

3=1 e<j 

Therefore, since £j is a conditional expectation of tij, its variance is neces- 
sarily smaller. □ 

We note that this unbiased estimate of l/p{ii) can be related to the 
Bernoulli factory approach of Latuszynski et al. (2010), in that we are only 
using Bernoulli events in this derivation. 

Given that cx{li,yj) involves a ratio of probability densities, a($i,yj) takes 
the value 1 with positive probability and the sum £j is therefore almost surely 
finite. This may however requires far too many iterations to be realistically 
computed or it may involve too much variability in the number of iterations 
thus required. An intermediate estimator with a fixed computational cost is 
fortunately available: 

Proposition 1. If (yj)j is an iid sequence with distribution q(y\ii) and 
(uj)j is an iid uniform sequence, for any k > 0, the quantity 



(3) g=i+Y, n u n Hui><*(*,vt)} 

j=l l<£<kAj k+l<t<j 

is an unbiased estimator of l/p($i) with an almost sure finite number of 
terms. Moreover, for k > 1, 



V 



l-j>(3») l-{l-2p{ h ) + r{ h )) k (2-p{i 



p 2 (h 



Mfo) ~ r (h 



p 2 (h 



(p(to)-r(to)) , 



where p is defined in (2) and r(fc) := f a 2 (fc,y) q{y\li) dy. Therefore, we 
have 



¥ 



6 



3i 



<¥ 



hi 



< ¥ 



h 



v[m|3i] . 
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The truncation at the k-th. proposal thus allows for a calibration of the 
computational effort since Q costs on average k additional simulations of 
Uj and computations of u($i,yj) to compute when compared with the 
regular Metropolis-Hastings weight ttj. 

PROOF. Define y = {yj)j>i and Uk-oo = (ue)e>k- Note that £° = rij and 
therefore, the conditional variance of is the variance of a geometric vari- 



able. Now, obviously £ 



fc+i 



E 



Sj 



fo,y,Uk+2:oc 



; thus, we have 



V 



fc+i 



ii 



+ E 



¥ 



iiiVi Uk+2:oo 



To get a closed-form expression of the second term of the rhs, we first intro- 
duce a geometric random variable T k defined by 

oo 

Tk = l + ^2Y[l{u k+i > a(zi,y k +e)} ■ 
j=i e<j 

Then, by straightforward algebra, £f may be rewritten as 



ii = C + I J| {1 - T k+2 I{u k+1 > afa, y k+ i)} 



\e=i / 
where C does not depend on u\, . . . , u k+ \. Thus, 

/ k \ 



V 



ii \fo,y, U k+2 :c 



Y[{1 - a{ii,yj)} 2 Tl +2 a{n,y k+1 ){l - a{^,y k+1 )} , 



Taking the expectation of the above expression, we obtain 

k ( 2 -p{li) 



E 



(v [|* 



hiVi Uk+2:oo 



) =(l-2p( 3i ) + r(3 i ))* 



which concludes the proof. 



□ 



3. Convergence properties. Using those Rao-Blackwellised versions 
of 5 brings an asymptotic improvement for the estimation of E vr [/i(X)], as 
shown by the following result which, for any M > 0, compares the estimators 
(k > 0) 



efc 
°M 



2-fi=i Si 
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For any positive function (p, we denote C v = {h; \h/(p\oo < 00} the set of 
functions bounded by ip up to a constant and we assume that the reference 
importance sampling estimator is sufficiently well-behaved, in that there 
exist positive functions ip > 1 and ip such that 

(4) V/i G C v , M > nh) 

Ei=i l /p(h) 

(5) vfcec,, Vm( E ^ { ^ M ^ -n(h)) AAA(o,rW) 

V 2Zi=i 1 /p(M) J 

Theorem 1. Under the assumption that ir(p) > 0, the following con- 
vergence properties hold: 

i) If h is in dp, then 

k ^ 

$M — >m->oo Tr(h) 
ii) If in addition, h 2 /p G Cp and h G C^, then 



(6) VM(5 k M - ir(h)) ^m^oo M (0, V k [h - ir(h)\) , 

where V k (h) := 7r(p) / vr(d 3 )V [|f | 3 ] ^(3)^(3) + T(h). 
PROOF. We will prove that, for all g G C v , 

M 

(7) M-^iidih) -^ir(g)Mp). 

i=l 

Then, i) directly follows from (7) applied to both g = h and g = 1. Now, de- 
note by Ti the cr-field Ti := cr(3 1; . . . , 3i+l, Ii , • ■ • , if)- Since E ifg(u) T-i 
g(fa)/p(h), w e have 

M / M \ Af 

^ 1 ^ iidihi) = S%-E [t^,i| Ji-i] + M- 1 5^5(3i)M3i)- 

i=l \i=l / i=l 

with C/jy,i := M^^gfei). First consider the second term of the rhs. Since 
(/? > 1, the function p is in C v ; then, Eq. (4) implies that M/{^2 i=1 l/p( 3 i)} — 
7r(p) > and therefore that 

M 

(8) V 9 GC„ M" 1 ^^)/^) A 7r( 5 )/7r(p) . 

i=i 
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It remains to check that Xa=i Um,i— E [J7jvf,i| ^i-i] 0- We use asymptotic 
results for conditional triangular arrays of random variables given in Douc 
and Moulines (2008, Theorem 11). Obviously, since \g\ € C^, 



M 



M 



[\U M A\^-i] = M' 1 \9(U)\/P(M) A *(M)Mp) , 



i=i 



i=l 



and we only need to show that YldLi E [|^Af,i|^{I^Af,i| > e }| -^i-i] 0. Let 
C > and note that {\U M>i \ > e} C {\gfa)\ > (eAfj/C} U {|f > C}. Using 



again E 

A* 



g(h)/p(fo), w e have 



(9) J^E [|i/j W f,i|I{|C^f,i| > e}\?i-l] < 



i=l 



1 ^ |g( 3i )|I{|g( 3i )|>(6M)/C} 1 ^F c ( 3t ) 



.1/ 



P(3i) ' 

with F C (M) ■= l»(3i)|E > C}| 3i] p(3i). Since F c < we have F c G 

Cp. Then, using again (8), 



1 M 

-Y 



\g{u)\I{\g{n)\>{eM)/C} P , 



M 
1 

M 



i=i 

At 



E 



£ P(3») 



P(3i) 

» vr(F c )/7r(p) 



0, 



which can be arbitrarily small when taking C sufficiently large. Indeed, using 
Lebesgue's theorem in the definition of Fc, for any fixed 3, limc_ s . 00 Fed) = 
and then, using again Lebesgue's theorem, limc^oo tt(Fc) = 0. Finally, 
(7) is proved. The proof of i) follows. 

We now consider ii). Without loss of generality, we assume that n(h) = 0. 
Write 

V M= m-. tEJ* ■ 

By (7), the denominator of the rhs converges in probability to l/ir(p). Thus, 
by Slutsky's Lemma, we only need to prove a CLT for the numerator of the 
rhs. Set U M ,i ■= M~^ 2 ^h($i) and write: 



M 



M 



M 



1=1 



vi=l 



1=1 
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Since h E and M~ X^=i ^/p{h) — ^ V^G 5 )) the second term, thanks 
again to Slutsky's lemma and Eq. (5), converges in distribution to A/"(0, 
T (h) / it 2 (p)) . Now, consider the first term of the rhs. We will once again use 
asymptotic results on triangular arrays of random variables (as in Douc and 
Moulines, 2008, Theorem 13). We have 

M 



J^E [Uh^Fi-i] - (E [UmA Fi-i}Y 



i=l 



M 



St 



p(hi) /p(h 



TT 



V 



If • h\.) P {.) /n( P ) 



i=i 



p{li) which 



by (8) applied to the non negative function h-> h 2 ($i)Y 

is in since it is bounded from above by /i 2 /p E C^. It remains to show 
that, for any e > 0, 



(10) 



M 



»l l \V M>i \>e 



i-l 



0. 



i=l 



Following the same lines as in the proof of i), note that for any C > 0, we 
have {\U M ,i\ >e}c {\h(u)\ > (eVM)/C} U {£% > C}. Using that 



E 



V 



+ E 



< 2/p 2 (fc), 



we have 



^E [\U M M\ U M,i\ >e}\T, 



i-l 



2 ^ > (eygygj- 1 ^F c ( 3i 

- 2^ p 2(^ + A/A 



M 



i=l 



^ p(3i 



withF c ( 3i ) := /i 2 ( 3l )E 



Since F c < (2h 2 )/p and 



h 2 /p E C^, we have Fc E C^. Then, using again Eq. (8), 

M 

E 



J_f - {h 2 (h)/p^)n\Hh)\ > (eVM)/C} 
M 



1 £ Fc(3, 



Af 



p{h) 
-k{F c )/tt{p) , 



0, 
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which can be arbitrarily small by taking C sufficiently large. Indeed, as in the 
proof of i), one can use Lebesgue's theorem in the definition of Fq so that for 
any fixed 3, limc^oc, Fcii) = 0. Then, using again by Lebesgue's theorem, 
limc^oo tt(Fc) = 0. Finally, (10) is proved. The proof of ii) follows. □ 

The main consequence of this CLT is thus that, asymptotically, the cor- 
relation between the £j's vanishes, hence that the variance ordering on the 
£i's extends to the same ordering on the 5m 's. 

It remains to link the CLT of the usual MCMC estimator (1) with the 
CLT expressed in (6) with k = associated with the accepted values. We 
will need some additional assumptions, starting with a maximal inequality 
for the Markov chain (ii)%. there exists a measurable function £ such that 
for any starting point x, 



111 V/ieC, 



sup Wij) ~ Hh)] 



0<i<N j=0 



> NC h (x) 



e 2 



where ¥ x is the probability measure induced by the Markov chain (3i)j>o 
starting from 30 = %■ 

Moreover, we assume that there exists a measurable function <j> > 1 such 
that for any starting point x, 

(12) V/i G C , Q n (x, h) A 77(h) = 7r(ph)/Tr(p) , 

where Q is the transition kernel of (}i)i expressed in Lemma 1. 



Theorem 2. In addition to the assumptions of Theorem 1, assume that 

h/pi 



h is a measurable function such that h/p G and {C^/p, h 2 /p 2 } C C, 



Assume moreover that 

c 



M (5° M - 7r(h)) ^ AT(0, V [h - n(h)}) 
Then, for any starting point x, 

1 

N 

where Mn is defined by 



VMn f ^= l /l(xW) - rr(h) ) A^oo AA(0, V [h - 7r(h))) , 



M N Mjv + 1 

(13) E^<^v< E 8- 



i=i i=i 
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Proof. Without loss of generality, we assume that vr(/i) = 0. In this 
proof, we will denote by F x (resp. E x ) the probability (resp. expectation) 
associated to the Markov chain (x^)t>o starting from a fixed point x. Using 

(7) with g = 1, one may divide (13) by Mn and let N go to infinity. This 

p 

yields that M^/N — > ir(p) > 0. Then, by Slutsky's lemma, Theorem 2 will 
be proved if we are able to show that 

N ^ Ef =^ (xW) - n(h)j A/"(0, V [h - n(h)]/ir(p)) . 

To that purpose, consider the following decomposition: 

TV 

N- 1 ' 2 E Hx {t) ) ■= Aiv,i + Ajv, 2 + A N , 3 , 



t=i 



where M* N := [Nir(p)\, 



M N +l, 



/ M N \ 

/ M N M% 



8=1 1=1 



AiV,3 :=^" 1/2 E^)- 



i=l 



Using that < N - YsiL\ €i ^ Hi N +i and Markov's inequality, 

p /|A , . x < ExCS^+il^+i)!) Q M - +1 (x, |fc|/p) 

^(.lAjv,!! > e) < 



which converges in probability to using that \h\/p < h 2 /p 2 + 1 and 
{h 2 /p 2 , 1} C C^. Thus, Ajv,i — ► 0. We now consider Ajy,2- Note that 



(14) P.dA^al > e) < P^d^jvl > eViV/2) + P,^! > ey/N/2) 
with 

M N \JM* N M N VM* N 

A N = E Kli)/P{ti) and 5at= E {Q-Vpia))hiM 

i=M N AM^ i=M N AM^ 
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Now, pick an arbitrary a £ (0, 1) and set M N := M N (1 — a) and Mn : = 

F 

Mtr(l + a). Since M^/N — > n(p), for all i] > 0, there exists iVo such that 
for all N > N , F X (M N <M N < M N ) > l-rj. Then, obviously for N > N , 
the first term of the rhs of (14) is bounded by 



X (\A N \ >eVN/2)<r] + 



sup 

M* N <i<M N 



h (h)/p(h 

j=M* N 



(15) 



+ : 



sup 



Mi 



> eVN/2j 



> eVN/2 ] 



Using (11), the second term of the rhs is bounded by 

4M N - M* N E x [C h/p ( U i* N )]/e 2 N , 

which converges to 4air(p)Tc(C h / p ) /e 2 as N goes to infinity using that C h / p G 
Cj). The resulting bound can thus be arbitrarily small as a goes to 0. Sim- 
ilarly, one can bound the third term of the rhs of (15) and let iV~ go to 

t — F 

infinity. Letting again a go to 0, we obtain that A^/yN — > 0. Similarly, 
the second term of the rhs of (14) is bounded by 



F X (\B N \ > eVN/2)<7 1 + 

(16) +P 3 
Denote Rn = Y^eLi (I? 



sup 



E {« 

j= M N 



sup 

M_* N <i<M* N 



Mi 



3=1 



P(h 



Kh 



> eVN/2j 



> eVN/2 ] 



l 

p{u) 



hfa). Clearly, (Rn) is a J-"- martingale where 

J 7 = (Fi)i>i and ?i is the <r-field J", := a fa, . . ., 3i+i, £° , ■ ■ ■ > Then, by 
Kolmogorov's inequality, one can bound the second term of (16) in the fol- 
lowing way: 



sup 

,M*<i<M N 



\Ri-RM N \>eVN/2) <4- 



(R 



Mi 



Rm n Y 



e 2 N 



£ 2 N 



E 

i=M* 



1 -Pfa) 

P 2 (h) 



h 2 fa) 



A(M N -M N + 1) E£m* Q l ( x > ^ 



^h 2 



e 2 N 



M N -M*+l 



4avr f ^/i 2 
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which can be arbitrarily small as a goes to 0. Similarly, one can bound the 
third term of (16) and let N go to infinity. Finally, letting a go to 0, we 

I — F IP 

obtain that B^/yN — > 0. Thus, Atv,2 — > 0. Finally, by Slutsky's lemma, 

A N , 3 := (N/M* N r 1/2 ^M^ ^ AA(0, V [h - *(h)]/x(p)) . 

V M n 

The proof now stands completed. □ 

Note that the above analysis also provides us with a universal control 
variate for Metropolis-Hastings algorithms. Indeed, while Lemma 2 shows 
that 

oo 
j=l £<j 

is an unbiased estimator of l/p(3i), a simple independent estimator of p{ii) 
is provided by a(fc, yo) when yo is an independent draw from q{Y\%i). While 
the variation in this estimate may result in a negligible improvement in the 
control variate estimation, it is nonetheless available for free in all settings 
and should thus be exploited. 

4. Illustrations. We first consider a series of toy examples to assess 
the possible gains brought by the essentially free Rao-Blackwellisation. Our 
initial example is a random walk Metropolis-Hastings algorithm with target 
the jV(0, 1) distribution and with proposal q(y\x) = <p(x — y; r) a normal 
random walk with scale r. The acceptance probability is then the ratio of 
the targets and Figure 1 illustrates the gain provided by the Rao-Blackwel- 
lisation scheme by repeating the simulation 250 times and by representing 
the 90% range as well as the whole range of both estimators. The gain 
provided by the Rao-Blackwellisation is not huge wrt to the overlap of both 
estimates, but one must consider that the variability of the estimator 5 is 
due to two sources of randomness, one due to the tij's and the other one 
to the 3i's. In addition, the gain forecasted by the above developments is 
in terms of variance, not of tails, and this gain is illustrated by Table 1. 
We provide in this table the ratio of the empirical variances of the terms 
T\ih($i) and £ih($i) for several functions h. The minimal gains when r = .1 
are explained by the fact that the acceptance probability is almost one with 
such a small scale, while the higher rejection rate of 82% when r = 7 leads 
to more improvement in the variances, because of an higher variability in 
the original rij's. Note that the last column of Table 1 estimates E[p(x)] via 
an additional draw from q(Y\fo) as pointed out at the end of the previous 
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h(x) 


a; 




lj£>0 


p(as) 


T = .l 


0.971 


0.953 


0.957 


0.207 


T = 2 


0.965 


0.942 


0.875 


0.861 


r = 5 


0.913 


0.982 


0.785 


0.826 


r = 7 


0.899 


0.982 


0.768 


0.820 



Table 1 

Ratios of the empirical variances of the components of the estimators 5°° and 5 of 
K[h(X)] for 100 MCMC iterations over 10 3 replications, in the setting of a random walk 
Gaussian proposal with scale t, when started with a normal simulation. 





median 


mean 


9.8 


9.9 


time 


T 


= .1 


1.0 


6.49 


5.0 


11 


2.33 


T 


= 2 


0.0 


7.06 


4.3 


11 


6.5 


T 


= 5 


0.0 


9.02 


4.6 


13 


8.4 


T 


= 7 


0.0 


9.47 


4.8 


13 


3.5 



Table 2 

Evaluations of the additional computing effort due to the use of the Rao~Blackwell 
correction: median and mean numbers of additional iterations, 80% and 90% quantiles 
for the additional iterations, and ratio of the average R computing times obtained over 
10 simulations in the same setting as Table 1. 



section. Table 2 gives an evaluation of the additional time required by the 
Rao-Blackwellisation, even though this should not be over-interpreted. As 
shown by both the difference between the median and the mean additional 
times and the variability of the increase in the R computing time, despite 
the use of 10 5 replications, the occurence of a few very lengthy runs explains 
for the apparently much higher computing times. Note that this difficulty 
with very long runs can be completely bypassed when using a truncated 
version S k instead of the unconstrained version S°°. 

Our second example is an independent Metropolis-Hastings algorithm 
with target the M(0, 1) distribution and with proposal a Cauchy C(0, .25) 
distribution. The outcome is quite similar, if producing a slightly superior 
improvement, as shown on Figure 2. Table 3 also indicates much more clearly 
that the gains in variance can be substantial. Once again, Table 4 shows that 
the computing time may vary quite widely due to a few outlying instances 
of late acceptance. 

Our third example is an independent Metropolis-Hastings algorithm with 
target the £xp(\) distribution and with proposal the £xp(fj,) distribution. 
In this case, the probability functions p(x) in (2) and r(x) in Proposition 1 
can be derived in closed form as 

p t x ) = l-±ZJL e -i>* and r(x) = i _ 2(A ~ ^ e -m . 
A 2A — 

This special case means that we can compare the variability of the original 
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n ^ 1 1 1 T 

200 400 600 800 1000 
iterations 



Fig 1 . Overlay of the variations of 250 iid realisations of the estimates 8 (gold) and S°° 
(grey) o/E[X] = for 1000 iterations, along with the 90% interquantile range for the 
estimates S (brown) and S°° (pink), in the setting of a random walk Gaussian proposal 
with scale r = 10. 



h(x) 


X 


x 2 


Ix>o 


p(x) 


T = 


.25 


0.677 


0.630 


0.663 


0.599 


r — 


.5 


0.790 


0.773 


0.716 


0.603 


r — 


1 


0.937 


0.945 


0.889 


0.835 


r — 


2 


0.781 


0.771 


0.694 


0.591 



Table 3 



Ratios of the empirical variances of the components of the estimators S°° and 5 of 
~E[h(X)] for 100 MCMC iterations over 10 replications, in the setting of an independent 
Cauchy proposal with scale t started with a normal simulation. 
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Fig 2. Overlay of the variations of 250 iid realisations of the estimates 8 (gold) and 5°° 
(grey) o/ELY] = for 1000 iterations, along with the 90% interquantile range for the 
estimates 8 (brown) and <5°° (pink), in the setting of an independent Cauchy proposal with 
scale 0.25. 





median 


mean 


<2.8 


q.9 


time 


r — 


.25 


0.0 


8.85 


4.9 


13 


4.2 


r — 


.50 


0.0 


6.76 


4 


11 


2.25 


r — 


1.0 


0.25 


6.15 


4 


10 


2.5 


t — 


2.0 


0.20 


5.90 


3.5 


8.5 


4.5 


Table 4 



Evaluations of the additional computing effort due to the use of the Rao-Blackwell 
correction: median and mean numbers of additional iterations, 80% and 90% quantiles 
for the additional iterations, and ratio of the average R computing times obtained over 
10° simulations in the same setting as Table 3. 
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h(x) 


X 


x 2 


Ix>i 


p(x) 


ft = 


9 


0.933 


0.953 


0.939 


0.238 






0.787 


0.774 


0.859 


0.106 


A* = 


5 


0.722 


0.807 


0.759 


0.591 






0.291 


0.394 


0.418 


0.285 


A* = 


3 


0.671 


0.738 


0.705 


0.657 






0.131 


0.175 


0.263 


0.295 


M = 


1 


0.641 


0.700 


0.676 


0.703 






0.0561 


0.0837 


0.159 


0.289 


Table 5 



Ratios of the empirical variances of the components of the estimators 5 and 8°° of 
K[h(X)] for 100 MCMC iterations over 10 3 replications, in the setting of an independent 
exponential proposal with scale started with an exponential £xp(l) simulation from the 

target distribution. The second row is the optimal gain obtained by using l/p(2i) as 
importance weight, i.e. the importance sampling estimator (4). 

Metropolis-Hastings estimator with its Rao-Blackwellised version 6™, but 
also with the optimal importance sampling version shown in (4). As illus- 
trated by Table 5, the gain brought by the Rao-Blackwellisation is significant 
even when compared with the reduction in variance of the optimal impor- 
tance sampling version. Obviously the most extreme case of ft = 0.1 shows 
that the ideal importance sampling estimator (4) could bring considerable 
improvement, were it available. 

Our fourth and final toy example is a geometric Qeo{(3) target associated 
with a one-step random walk proposal: 

vr(x) = 0(1 - pr ^d 2q(y\x) = JV^ 1 '? X>0 > 

[%\<1 ltX = 0. 

For this problem, 

p(x) = 1-/3/2 and r(x) = 1-/3 + f3 2 /2 . 

We can therefore compute the gain in variance 

p(x) - r(x) 2 - p(x) = /3(1 - /3)(2 + g) 
2p(x)-r(x) p 2 (x) (2 -/3 2 ) (2-/3) 2 

which is optimal for (5 = 0.174, leading to a gain of 0.578 while the relative 
gain in variance is 

p(x) - r(x) 2 - p(x) _ (1 - /3)(2 + f3) 
2p(x) - r(x) l-p(x) ~ (2 - /3 2 ) 

which is decreasing in (3. 
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HP) 


ft 


P2 


I/3 2 >0.5 


T = 


01 


0.523 


0.516 


0.944 






0.999 


0.999 


0.996 


T = 


05 


0.481 


0.518 


0.877 






0.864 


0.888 


0.929 


T = 


1 


0.550 


0.555 


0.896 






0.749 


0.748 


0.765 


T = 


2 


0.562 


0.568 


0.845 






0.532 


0.527 


0.620 


T = 


5 


0.556 


0.565 


0.778 






0.412 


0.433 


0.479 



Table 6 



Ratios of the empirical variances of the components of the estimators 8 and <5°° of 
E[/i(/3)] for 10 4 MCMC iterations, in the setting of a random walk proposal with scale t 

started from the MLE estimate of /3 applied to the Pima Indian diabetes study. The 
second row for each value of t is the additional improvement in the empirical variances 
resulting from using the control variate. 



We now apply the Rao-Blackwellisation to a probit modelling of the Pima 
Indian diabetes study (Venables and Ripley, 2002). The dataset we consider 
covers a population of 332 women who were at least 21 years old, of Pima 
Indian heritage and living near Phoenix, Arizona. These women were tested 
for diabetes according to World Health Organization (WHO) criteria. The 
data were collected by the US National Institute of Diabetes and Digestive 
and Kidney Diseases, and is available with the basic R package. The goal is 
to explain the diabetes variable in terms of the body mass index. We use 
a standard representation of the diabetes binary variables & as indicators 
Hi = Izi>o of latent variables Zi, Zi\f3 ~ M (x^/3, l), associated with a stan- 
dard regression model, i.e. where the Xj's are p-dimensional covariates and 
j3 is the vector of regression coefficients. Given j3, the y^s are independent 
Bernoulli rv's with P(yj = l|/3) = $ ( x i^) where $ is the standard normal 
cdf. The choice of a prior distribution for the probit parameter f3 is open to 
debate (Marin and Robert, 2007) but, for illustration purposes, we opt for a 
fiat prior. The Metropolis-Hastings algorithm associated with the posterior 
is a simple two-dimensional random walk proposal with a single scale r, due 
to the normalisation of the body mass index. Simulations based on differ- 
ent scales r show significant improvements in the variance of the terms of 
5 and 5oo by a factor of 2. If we consider in addition the possible improve- 
ment brought by the control variate indicated at the end of the previous 
section, the regression coefficient can be obtained by a simple regression of 
£,ih{ii) over £ia(3i, yo) and Table 6 shows that this additional step brings a 
significant improvement over the Rao-Blackwellised version. 
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